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Research Objective: 
What is Artificial Intelligence? 


Natural Language Processing (NLP) 


Speech Recognition is the task of 
transforming spoken sound to text; and 
Text-to-Speech Synthesis is the task 
of transforming text to audible sound. 
Digital assistants interact with humans 
by listening to voice commands, execut- 
ing some tasks (fetching news updates, 
placing purchase orders, setting timers, 
etc.) and then communicating the re- 
sults in the form of human speech. 


Machine Translation is the task of 
transforming text in one language into 
another. 

A translation system translate text from 
more than one hundred languages. 


Bl The question “Can Machines Think?” was answered by Alan 
Turing with a thought experiment that became known as 
the Turing Test. In his proposed experiment, a computer 
program is intelligent if a human interrogator, after posing 
some written questions, cannot tell whether the written re- 
sponses come from a person or from a computer. Turing’s 
proposed test was an attempt at measuring intelligence by 
performance on some kind of open-ended behavioral task, 
rather than by philosophical speculation. 


Bl Artificial Intelligence (AI) refers to computer programs that 
process linguistic (textual and auditory) and visual informa- 
tion to perceive and interact with the world like humans. 


Object Detection is the task of local- 
izing (determining pixel locations using 
a bounding box) and classifying objects 
in an image. 

An autonomous driving system finds the 
surrounding vehicles by first localizing 
and classifying objects in images cap- 
tured by a front camera and then com- 
puting a distance estimate for each. 


Pose Estimation is the task of detect- 
ing the position and orientation of ob- 
jects or humans in a scene. 

A robotic arm predicts 3D position 
and orientation of rigid objects for the 
purpose of manipulating them (pick- 
ing, moving, and placing) in interactions 
with human collaborators. 


Image Synthesis is the task of gener- 
ating a modified version of an existing 
image or generating an entirely new one. 
A facial expression generation system 
generates various emotional expressions 
from a neutral face while preserving its 
identity. 


Computer Vision (CV) 
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Image Transcription is the task of 
observing unstructured visual represen- 
tation of some type of data and then 
transcribing the information into dis- 
crete textual form. 

An image captioning system observes 
the scene depicted in the figure and pro- 
vides a textual statement about its con- 
tent. 


Image Super-Resolution is the task 
of generating a higher resolution version 
of the original image. 

An image enhancement system enables 
fast acquisition of high-resolution MRI 
images by producing high-resolution re- 
constructions from low resolution acqui- 
sitions. 


Image Denoising is the task of recov- 
ering the original image signal from its 
corrupted form. 

An image restoration system recon- 
structs MRI scans from under-sampled 
inputs. 
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Research Objective: 


What is Machine Learning? 


Bl Machine Learning (ML) refers to computer programs that ac- 


complish their tasks by learning from examples. 


Bl ^ computer program is said to learn from experience E with 


respect to some class of tasks T and performance measure P, if 
its performance at tasks in T', as measured by P, improves with 
experience E. 

The experience that is available to the program is represented 
by the dataset which itself is a collection of examples. An 
example consists of a pair of datapoint and its corresponding 
label/target. A datapoint, is a collection of features that 
have been measured from some object or event. 

For classification tasks the output is discreet-valued while for 
regression tasks the output is continuous-valued. "Therefore, 
the performance measure is task specific. For classification 
tasks the accuracy (i.e., the proportion of examples for which 
the program produces the correct outputs) is used as the per- 
formance measure while for regression tasks the average error 
is used. 


Bl ML is about creating value from data and its recent advance- 


ments (i.e., the development of Deep Learning) have come from 
the pursuit of artificial intelligence. 

AI can't be achieved with explicit programming because it is in- 
feasible to manually formalize vast amounts of information in a 
form that computers can use. In contrast, ML offers the ability 
to learn a task by experiencing a collection of examples. 


Training: Two Guiding Principles 


The ML model, denoted by f (-,w), must perform its task 
in the inference phase; It will process examples that are 
considered to have been drawn from the data-generating 
distribution Paata, i.e., (£, y) ~ paata- 

The ML model is constructed in the training phase using a 
collection of available examples, i.e., the training set, that 
are considered to have been drawn from the same paata, 1.6., 
(attain train) ~ Paata. There are two guiding principles for 
constructing the model using the training set: 


Bl Empirical Risk Minimization 


The optimal model, f (-,w*), is one that performs its task 
in the inference phase with minimal risk. The risk (also 
referred to as the generalization error) is defined as 
R(w) 三 E(sy) pass [L(f (2, w),y)). In other words, risk 
is the expected value of the error (i.e., the output of the 
loss function L) on examples that are drawn from Paata- 
Therefore, w* = argmin FUNT NN [L( f(a, w), y)]). 
w 


However, since it is impossible to access all examples 
(x,y) ^ Paata, the risk must be minimized indirectly. 
This is attempted by minimizing the empirical risk on 
the training set, hoping that doing so will also reduce 


the risk. The empirical risk is defined as Remp(w) = 
LOYuaL(f(z?,w),y). In other words, the empirical 


risk is the average error for the examples of the training set. 
According to the Law of Large Numbers, Remp(w) will rep- 
resent the R(w) for a large enough training set. Therefore, 
w* = argmin (+ 377, L (f (a9, w) ,yi)). 

w 
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Bi Likelihood Maximization 


The optimal model, f (-,w*), is one that maximizes the like- 
lihood, or equivalently, minimizes the negative-log-of-the- 
likelihood over the entire training set.  Threfore, w* = 
argmin (— log [p (X [w, y)]). 

w 


Generalization 


Once training is finished, the model enters the inference phase in 
which it will process previously unseen examples. If, after train- 
ing, the gap between the average error computed on the training 
set and the average error computed on the test set (represent- 
ing the generalization error) is minimal, it is said that the model 
generalizes well. Otherwise, the model overfits the training 
set. The model’s generalization ability can be improved through 
regularization, which is any modification to the learning al- 
gorithm that is intended to reduce the gap between the average 
errors computed on training and test sets. The modified guiding 
principles are referred to as the Structural Risk Minimiza- 
tion and the Posterior Maximization respectively. 
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Research Objective: 
Al Assisted Workflows 


Motivation 


This line of inquiry is motivated by the fact that 
the large volumes of data (that we can easily 
store and manipulate these days) can be a rich 
resource for scientific discovery and develop- 
ment of technology if we can extract valuable 
information from them; And when it comes to 
deriving valuable information from data, ML 
emerges as the key solution. 


Methodol 


Devise and conduct experiments that aim 
to answer three fundamental questions: 
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A vel use case for two datedriven models, namely, a Transformer and a convolutional 
graph neural network (CGNN) is proposed. The authors propose to use these models for 
‘emulating the dynamics of electromagnetic (EM) propagation and scattering, The 
‘Transformer translates a past sequence into a future sequere by constructing. repre- 
tentations from the past and using it to peedict the Future, taking all of its own previous 
predictions as input at exch step of predicion. The CGNN updates the current state of 
attribute vectors of cach node by passing it information (messages) from all of its 
neighbouring nodes. We train these models with FDTD simulations of plane waves 
Propagating and scattering from PEC objects The authors demonseraze that, within the 
bounds of computational resources, the Transformer can be utilised ay a surrogate for 
EM dyramics, providing 14 speed-up, while the CGNN can be utilised as a nest-frame 
predictor, peeling 9 speed-up. When comparing the accuracy of these two mdeh 
‘with the authors’ previously developed Encoder-Recurrent-Decoder (ERD) model, it is 
observed that the error for both the Transformer and the CGNN remains within the 
same bound for the ERD model. To the best of the authors” knewiedge, this work is the 
first o utilise the Transformer as a surrogate for EM dynamics 


ABSTRACT In this paper, we propose the usc of GANS as learned. data-driven knowledge database that 
cun be queried for rapid synthesis of suitable antenna designs given a desired response. As an example, 
we consider the problem of designing the Log-Periodic Folded Dipole Array (LPFDA) antenna for two 
non-overlapping ranges of Q-factor values. By representing the antenna with the vector of its structural 
parameters and considering each desirable range of the Q-factor as a class, we transform our problem. 
to that of generating new samples from a given class. We develop two alternative models, a Conditional 
Wasserstein GAN and a label-switched library of vanilla Wasserstein GANs and train them with a dataset 
of features and their associated labels (parameter vectors and Q-factor range). The main component of 
these models is a generator network that learns to map a normally distributed noise vector along with a 
binary label to the vector of parameters of candidate structures. We demonstrate that in inference mode, 
these models can be relied upon for fast generation of suitable designs. 


ABSTRACT In this paper, we propose a deep neural network based model to predict the time evolution 
of field values in transient electrodynamics. The key component of our model is a recurrent neural 
network, which leams representations of long-term spatial-temporal dependencies in the sequence of its 
input data. We develop an encoder-recurrent-decoder architecture, which is trained with finite difference 
time domain simulations of plane wave scattering from distributed, perfect electric conducting objects 
We demonstrate that, the trained network can emulate a transient electrodynamics problem with more 
than 17 times speed-up in simulation time compared to traditional finite difference time domain solvers. 


INDEX TERMS Computer vision, electromagnetics, finite difference methods, machine learning, recurrent 
neural networks, unsupervised leaning. 


INDEX TERMS Artificial neural networks, electromagnetics, machine learning. 
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(1) Is it possible to obtain predictive models 
from the available simulation and measurement 
data? 

(2) What are the domain-specific machine 
learning algorithms required to convert various 
datasets to modeling knowledge? 

(3) Once the modeling knowledge has been 
learned, how to seamlessly incorporate it into 
a data-driven predictive environment? 

As will become clear, these experiments recast 
engineering problems as ML/AI tasks. 


L INTRODUCTION 
OWADAYS, data-driven methods play a variety of vital 
roles across physics, engineering and sciences. With 

the increasingly available computing power, data storage, 

and high-resolution sensors, there is an enormous opportu- 
nity 10 characterize the complex dynamical systems from 
big data. Noticeable studies include discovering govem- 
img equations from time-series data [1], (2], equation-free 
modeling [3]. (4], empirical dynamic modeling [S]. [6]. 
emergent behavior modeling [7], and automated inference 
of dynamics [8]-{10]. Recently, data-driven machine Icam- 
ing methods have been applied to computational science and 
fluid simulations [11H16]. In electromagnetic (EM) applica- 
tions, various advances are made in forwandinverse scatter- 
ing (17]-L19], direction-of-arrival estimation [20]-123], radar 
and remote sensing [24]-[26], image analysis [27]-[29], and 

stochastic design [30]. 

1n this work, we present the study of emulating transient 
electrodynamic physics via a physics-informed deep neural 
network (DNN). The network architecture comprises a con- 
wolutional encoder, a recurrent neural network (RNN), and 

a convotutional decoder. The encoder and decoder extract 

the feld and geometry (or object boundary) information, 


and the RNN, implemented bere as a convolutional LSTM 
(Long Short-Term Memory RNN), replicates the time evo- 
lution of wave physics. The trained network demonstrates 
the approximation power of deep-learning methods, and can 
be used for rapid time-domain electromagnetic analysis. A 
comparison of the traditional finite difference time domain 
(FDTD) solver and proposed DNN-based model is illustrated 
in Fig. 1. 


Ii. RESEARCH METHODOLOGY 
A. MOTIVATION 

In FDTD, EM field vectors are discretized with Yee grids 
in the spatial domain; Faraday's and Ampere’s laws are 
enforced inherently. The amplitude of the field vectors are 
updated at cach discrete time-step [31], [32]; therefore, the 
output of the FDTD method is a sequence of time-evolving 
grid values (which in the context of our work is interpreted 
as a sequence of time-evolving images or a video. In 
machine leaming, RNNs are specialized networks for pro- 
cessing sequential data [33], [34]. When training an RNN, 
the output of the network is fed back as a part of the input 
to predict the next value. Through recursively updating the 
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1 | INTRODUCTION 


“These days linge vohimes of data can he romded from sensors 
‘or generated using powerful computers with relative ease. IF 
valuble formate can be extracted from them, these vase 
amounts of data can be a rich resource for the digital economy. 
A key technology that can help w achieve this, is machine 
learning (ML). Some industries, such as online retailers, social 
media arid virtual personal assistants, are already creating value 
from its use. Other industries, such as pharmaceutical, energy 
infrastructure, manufacnunng, transpeetatioe (autonomous 
vehicles) and healthcare are witnessing its emergent applica 
tions MI ability vo extract valualie enfoemanion from large 
volumes of data could also assist ín scientific discovery and 
technology development. To unlock the potential benefits of 
ML to science and technology, extensive research is needed w 


explore what algorithms are suitable and bow they can be 
applied [1-4] 

In a previous work [5], we considered the fied of compu- 
tutional electromagnetics and focused on answering three 
fundamental questions: (1) [+ a possible to obtain peedietive 
models from the available smulabon and/or measurement data? 
(2) What are the domain-specific machine learn algorithms 
required to convert various datasets to modelling knomtedge? (3) 
Once the modelling knowledge has beer learn, how w seam- 
lesly incorporate it into a data-driven predictive environment? 
In that work, we successfully demonstrated that an encoder- 
recurrent-deoader (ERD) model can predict many frames into 


initial state of the electromagnetic (EM) ficlds. Furthermore, we 
successfully incorporated our dateativen model in a domin. 
decomposition (DDM) paralleistion scheme thereby achieving, 
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L INTRODUCTION 

APID increase of computing power has transformed 

computational physics. In recent years, taking advan- 
tage of the vast amount of data that can be gener 
ated using muti-physics simulations. Deep Neural Network 
(DNN) based data-driven approaches have produced fast 
surrogate models for conventional numerical methods im 
‘both computational fluid dynamics [1}-{6} and computa- 
tional electromagnetics [7-11]. DNN based approaches. 
are also being applied to inverse problems, ie. design- 
img a suitable structure given a desired (mechanical or 
electromagnetic) response [12]. with notable advances 
in the design of airfoils [13] and RF/manophotonic 
metasurfaces [1418]. 

In this work, we describe the inverse design of the 
LPFDA antenna for two non-overlapping Q-factor ranges as 
querying a learned, data-driven, knowledge database for suit- 
able designs given the desired range. We utilize Generative 
Adversarial Networks (GANS) to generate new samples that 
are similar to a dataset of LPFDA antennas that was put 
together from some favorable designs. By training a GAN 


on this dataset, it learns an estimate of its data-generating 
distribution and therefore, the GAN effectively captures the 
Knowledge that is gained from many design iterations and 
optimization rounds. The Q-factor becomes am important 
consideration in the miniaturization of antennas and given 
the extensive amount of time that is required to search the 
space of all possible designs for the near-optimal candidates, 
it is reasonable to investigate the feasibility of using GANs 
tw accelerate this process. From a machine leaming perspec- 
tive, we view cach desirable range of Q-factor as 3 class and 
cast the problem of antenna design in terms of generating 
samples (in the form of a vector of structural parameters) 
from cach class. We choose GANS because they can pro- 
duce new samples from the same distribution as the dataset 
that they were trained with, and draw on a large number of 
suitable designs as our training dataset and experiment with 
two alternative architectures. The first, depicted in Fig. 1(b), 
is a Conditional Generative Adversarial Network (CGAN). 
lo this model, the conditional generator accepts à noise vec- 
tor and a binary class label at its input and maps them 
o a vector of structural parameters from the class that is 
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Feed Forward Networks (FFNs 


Bl Artificial Neural Networks (ANNs) are Back Propagation 
a category of ML models. They are compu- 
tational models that are composed of multi- 
ple processing layers intended to learn rep- 
resentations of data at multiple levels of ab- 
straction. 


Bl Each layer consists of several units (also 
referred to as neurons) that independently 
process that layer's input vector. A unit 
produces its output by multiplying each el- 
ement of the input vector by a dedicated 
weight, summing them together along with 
a bias and, finally, applying a non-linear 
function to the total sum. 

Since each unit processes all the elements of 
the input, this type of layer is referred to as 
a Fully-Connected layer. 


Non-Linear Functions 

Bl The output of a layer, which is the vector 
containing the outputs from all the units in 
that layer, is computed as the multiplication 
of the expanded weight matrix and the 
expanded input vector. The output of 
each layer becomes the input to the next 
layer and the computations continue. 


Bl In the training phase, the learning algorithm 
will need to compute the gradient of the 
cost w.r.t the weights and biases. For this 
purpose, the ó vector at the input of each 
layer is computed as the multiplication of 
the weight matrix with the scaled ó vector 
at the output of that layer. 
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Convolutional Layers 


W ^ convolutional layer is a layer which is spe- Mil W = Win + 2Py — Dw(Fw —1)-1 +1 Same/Valid Convolution Transpose Convolution 


cialized for processing inputs that are in the Vout = S. 
. Ww 
form of a grid of values. 


Input 


NN 


NS 


ANS 


H; +2P, — Dn(Fh — 1)— 1 
Bl This type of layer can learn a single model Hout = 一 - S 一 一 +1 
for processing local neighborhoods of all the 5 
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positions across the input grid. 

This capability results from the layer’s 
sparse interactions with its inputs and its 
output’s equivariance to translations. 
The first property is achieved by restricting 
the interactions of the layer’s set of weights 
(which are much smaller in size compared to 
the input) to only a local region of the input. 
The second property is achieved by weight 
sharing between different local regions. 


Padding refers to insertion of zeros to both 
sides (beginning and end) of the input in 
each dimension. 

Stride refers to the distance between two 
consecutive positions of the filter. 

Dilation refers to the inflation of the filter 
by inserting spaces between filter elements; 
This provides a convenient way to increase 
the receptive field (i.e., the local region of 
interaction) of the filter without increasing 
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Bl Each element of the output is produced by its size. mS Py = Ph = 
sliding the set of weights, i.e., the filter, = Sw = Sh = 
across the input. For each position, the Dy = Dy =1 Dy = Dp = 1 


weights in each channel of the filter are 
multiplied by their corresponding elements 
in the input (ie., same channel and same 
position in the input) and the results are 
summed together. The sums from all the 
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channels are added together along with a iS HL Py = Py =1 
bias. Finally, a non-linear function is ap- " ge Sw = Sp = 2 
plied to the total sum. d Dy = Dn =1 
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Deep Learning: 


Recurrent Layers 


Bl ^ recurrent layer is a layer which is special- 


ized for processing inputs that are in the 
form of sequences. 


Bl This type of layer can learn a single model 


for processing all the positions of an in- 
put sequence. This capability results from 
weight sharing across all the positions 
of the input sequence which is achieved 
through recurrent updates. 

A recurrent update refers to the applica- 
tion of the same forward propagation oper- 
ation (i.e., multiplication by the same set of 
weights) to produce the current position in 
the output from the current input and the 
previous position in the output. 


B The LSTM is the most suitable architecture 


for learning long-term dependencies. It 
comprises a Constant Error Carousel 
(CEC) and three gates, namely, the for- 
get gate, the input gate and the output 
gate. The CEC acts as a memory mecha- 
nism which, selectively, holds the value from 
the previous input. 

As such, the CEC provides an unatten- 
uated/unmagnified path for the gradients 
of the cost as they propagate backwards 
through time and, therefore, helps to miti- 
gate the vanishing/exploding gradients 
problem in long sequences. 
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Deep Learning: 
Deep Feed Forward Networks 


Bl The Universal Approximation Theorem (UAT) states that 
a FFN with a linear output layer and at least one hidden layer 


Vanishing/Exploding Gradients 


Non-Linear Functions 
which has any non-linear activation function (i.e., either a ReLU When bu gradients of E cost w.r.t. the weights and biases os 0 [Ln Sume 
or a ’squashing’ function such as the sigmoid) can approximate of the /* layer, ie., ——5, become so small that they can’t 
any Borel-measurable function (i.e., a continuous function de- effect meaningful change in that layer's parameters, the learning ost 
fined on a closed and bounded subset of R”) with any desired algorithm is said to suffer from vanishing gradients. 
non-zero amount of error, provided that the FFN is given enough Likewise, when a become so large that they cause unstable 
hidden units. change, the learning algorithm is said to suffer from exploding d r 
m While the UAT guarantees the existence of the function that gradients. ; 
would generalize to unseen examples, it provides no guarantees 8 Non-saturating Non-linear Functions: ReLU “本 H 
that the learning algorithm will be able to learn that function. In computation of à for deeper (i.e., lower) layers, the deriva- a 
Specifically, learning can fail either due to the optimization tive of the non-linear function from all the previous layers are - 直 -------- ° 
algorithm’s inability to find the set of parameters that corre- multiplied together. Since for the ReLU f (z > 0) = 1, these LL LL 1 
sponds to the desired function or due to overfitting. multiplications will not attenuate the elements of the ó. ee dg C C. 
m While the UAT guarantees the existence of a FFN large enough While f (2«0)20 does contribute to zero elements in 0 " Derivatives of the Non-Linear Functions 
to achieve any desired degree of accuracy, it provides no indi- and, consequently, in i the back-propagation procedure | Eze r ` 
cation as to how large it should be. will work as long as the gradients can propagate along some am icre | ! ` 
E [n summary, a FFN with a single hidden layer is sufficient to paths; ir other Words S Tong 9 20 X DOOR SOR ofthe units i ] ， 
. i in each layer. 0.7} d 
tepresent bud BID Pu ai layer may Aaah 0 TOE ie In contrast to the ReLU, both the hyperbolic tangent and the | 1 
or to generalize and, also, it may be infeasibly large. ; : i dr PD a ' 
sigmoid functions cause significant attenuation in elements of os í 
Bl FFNs are structured as deep stacks of layers, not just to alle- the ó in deeper layers even when their inputs are from a small 1 
viate the problems that the shallow networks present, but also interval centered at zero (i.e., are not from the saturated re- MI ` 
to arrange for the sought-after function to be constructed by gions). 93r x 
composition of several simpler ones. oa — | | | = NS 
N, 
Bl Training deep FFNs is fraught with difficulties. A major ob- e m SS EN 
stacle for the learning algorithm is the failure of the back- 3. UE UU 377» ae 
propagation procedure. The main failure mechanisms are the 


vanishing/exploding gradients and the shattered gradi- 
ents. 
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Deep Feed Forward Networks 


" Weight Initialization 


Proper weight initialization ensures that, at the start of train- 
ing, the magnitudes of a and 6“ will not attenuate to zero 
or grow very large. With proper initialization, in the forward 
pass, starting from an input with zero mean and unit vari- 
ance, the variance of the elements of the output of each layer 
remains the same as that for the layer bellow it. Likewise, in 
the back-propagation, the variance of the elements of the 6 at 
the output of each layer remains the same as that for the layer 
above it. 


He Initialization 


For layers with ReLU activation, the weights can be drawn 


from either a normal distribution w® ~ N(0, Or): in which 


Oj.. denotes the size of the input to the /*" layer, or a uniform 


distribution w® ~ U[-,/5%, ol. 


Glorot Initialization 


For layers with hyperbolic tangent activation, the 
weights can be drawn from either a normal distribu- 
$ ap) 2 "s F $ a E 

tion w ~ NU, 5555 Jo) or a uniform distribution 


an(D mw 6 6 
w Ul Oi Oi 4? ono 


Shattered Gradients 


When the gradients of the cost w.r.t the weights and bioases 
of the [t^ layer, i.e., C. exhibit unexpected behavior, the 
learning algorithm is said to suffer from shattered gradients. 
'This effect arises from the sequential dependance of layer out- 


puts and also the simultaneous update of all layers. For example, 


34,0) a\ a4, 0) 
1)8 4 q(3)9aO e 
from the chain rule, 29, = aP 24 4 | wi) 24 9€.. The 
aw az@) i=l il 5,0) gal 


infinitesimal change in wo is computed under the assumption 
that its constituent terms don't change as W. (2) undergoes this 
change. However, this assumption doesn’t hold given that a(2)， 
aĉ) and, ultimately, C are themselves dependent on we and 
also that wt and all wg are updated simultaneously. 


m Shortcut Connections: The Residual Layer 


Adding shortcut (skip) connections to layers results in mul- 
tiple paths of different lengths (including the direct /identity 
path) between the input and the output. 

In general, gradients through shorter paths will be better be- 
haved. Since both the identity term and various short chains 
of derivatives will contribute to the derivative for each layer, 
networks with shortcut connections suffer less from shattered 
gradients. 


Batch Normalization 


The learning algorithm can also greatly benefit from learning 
the moments of distribution (i.e., the mean and variance) of the 
elements of the input to each layer alongside the weighs and 
biases of that layer. 

The batch normalization (BN) procedure, in the first step, stan- 
dardizes each element of the layer's input with the mean and 
variance that is computed over the current batch for that ele- 
ment and, in the second step, scales and shifts the standardized 
elements with learned parameters. 

In the training phase, for at, the BN procedure computes 


Wk.B and az p from the entire batch and then transforms it to 


» (1-1) ,(l—1) ‘ A ,(l—1) 
ak  —^xay  -yinwhicha,;, = T B 


2 (0-1), p 
In the inference phase, a ;, is computed using the average of 


the values computed for uj, p and o p during the training. 


i= 
(ad 1) = Iu. B). 
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ll Developed by the Google Brain team 


Wi First released in 2015 
Bl lensorFlow 2 was released in 2019 


e 
Bl Developed at the Facebook's AI Research (FAIR) lab 
y Bi First released in 2018 
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import tensorflow as tf Training (2) 


import tensorflow as tf 


© seq_network_1=get_seq_network_1(arg_1, ...) 
(2) model -ml algorithm(seq network i-seq network 1, class ml algorithm(tf.keras.Model): 
) 
def | init. (self,seq network 1,...): 
model.compile(optimizer -tf.keras.optimizers.Adam(...), superO.. init. O 
loss m ， self.network 1-seq network 1 
«metrics» - ) 
(3) callbacks- [callback custom checkpoint O 5 def compile(self,optimizer,loss,<metrics>): 
callback_custom_monitor (E super().compile() 
callback custom history Gaels self .optimizer=optimizer 
tf.keras.callbacks.Tensorboard(...) ] self.loss =loss 


self .<metrics>=<metrics> 
(2) save data to tfrecords format (arg 1,...) 
def call(self,X,training-None): 
(5) data train =read_tfrecords(mode=’ train’ oo) x-self.«network»(X,training-training) 
data validation-read tfrecords(mode-'validation',...) 


Q 


import tensorflow as tf 


def get_seq_network(arg_1,...): 


X =tf.keras.Input(shape=..., ...) 

x =tf.keras.layers.Dense (...)(X) 

x =tf.keras.layers.BatchNormalization(...) (x) 

x =tf.keras.layers.ReLU Cors) (X) 
return tf.keras.Model(inputs-X, outputs-x, name=...) 


9 


import tensorflow as tf 


class callback custom checkpoint(tf.keras.callbacks.Callback): 


XOJSD Eie Cx 本 ， even Ob def on epoch end(self, epoch, logs-None): 
validationodatasdata. validation, def train step(self,data): tf.keras.models.save model(self.model ,path to checkpoint) 
callbacks -callbacks , if isinstance(data,tuple): 

) x,y-data 

import tensorflow as tf with tf.GradientTape() as tape: 

Bar predictions=self (x, training=True) 

model -tf.keras.models.load model(path. to. checkpoint, e 

: ) loss =self.loss(y,predictions ) 

data = s gradients =tape.gradient ( loss , Self.<network>.trainable_weights ) 

predictions-model(data) self.optimizer.apply gradients(zip(gradients, self.<network>.trainable_weights) ) 


return {’loss’: loss, 


} 
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Experiment Setup in TensorFlow 2 


© 


import tensorflow as tf 


def save_data_to_tfrecords_format(...): 


def serialize_example(x,y): 


feature ={’x’: tf.train.Feature(bytes_list=tf.train.BytesList (value=[x.tostring()])), 


?y’?: tf.train.Feature(int64 list-tf.train.Int64List(value-[y 
example proto-tf.train.Example(features-tf.train.Features(feature-feature)) 
return example. proto.SerializeToString() 


def write serialized example(x,y,mode): 

filename -... 

with tf.io.TFRecordWriter(filename) as writer: 
for i in range(number of examples): 
example-serialize example(x[i],y[i]) 
writer.write(example) 


def get train validation splits(x,y,...): 
return x train, y train, x validation, y.validation 


X,y -get data [er 


x train, y train, x validation, y validation-get train validation splits(x,y,... 


write serialized example(x train , y-train ,’? train’ ) 
write serialized example(x validation, y_validation,’’validation’’) 


1» 


9 


import tensorflow as tf 


def read tfrecords(mode, ...): 


feature 


={’x’: tf.io.FixedLenFeature([],tf.string), 
?y?: tf.io.FixedLenFeature([],tf.int64 ) ) 


def parse function(example. proto): 


parsed. 


x 


example-tf.io.parse single example(example.proto,feature ) 
-tf.io.decode raw (parsed_example[’x’] ,tf.float32) 


x.set shape(shape of x) 


y 


return 
dataset 
dataset 
dataset 
dataset 


=parsed_example[’y’] 


x,y 

=tf.data.TFRecordDataset (filenames ) 
=dataset .map (parse_function ) 
=dataset.batch (batch_size , drop_remainder =True) 
=dataset.shuffle (buffer_capacity, reshuffle_each_iteration=True) 


return dataset 
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Why Finite Difference Time Domain? 


14/37 


Bl The aim of this experiment is to construct a data-driven 
model for dynamical systems such as a propagating EM 
wave. In other words, the aim is to construct a model that 
can produce the next state of a propagating wave given its 
past states by learning wave dynamics from a dataset of 
propagating waves. 


Bl There are two numerical schemes for solving the PDEs that Yee Cell 


govern the electromagnetic phenomena. 

The Finite Elements Method (FEM) solves the time- 
harmonic second-order PDEs for the vector potentials. The 
FEM discretizes the computational domain into finite el- 
ements and interpolates the unknown function over each 
element using its values at the nodes of that element. It 
constructs and solves a linear system to find the unknown 
function’s values at the nodes. 

The Finite Difference Time Domain (FDTD) method 
solves the time-dependent first-order Maxwell’s PDEs. The 2 
FD'TD method is a time marching scheme that directly for- | 
wards the electromagnetic fields in time over a regular spa- 

tial grid, referred to as the Yee grid. 
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Split-PML ABC: 2D TE^ Mode 


Media with Anisotropy in Conductivities: 2D TE^ Mode 


: 7 e m Electric Conductivity Profile 
Maxwell’s Equations Matched Medium (0,7,5,0, oy") 
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Experiment-l: 
Emulation of the FDTD using RNNs 


Why Finite Difference Time Domain? 


Bl The main reason that the FDTD method is the 


suitable numerical scheme for experimentation 
is that its output is a sequence of grid values. 
The grid of field values can be interpreted as a 
grid of pixels, i.e., an image. Then, the tem- 
poral variations of the grid values constitute a 
sequence of images, i.e., a video. 

This interpretation connects the domain of 
Computational EM (CEM) with the domain of 
Computer Vision (CV) and recasts the FDTD 
scheme for time-marching of EM fields as the 
task of predicting the future frames of a 
video. 


Bl With the aim of constructing a data-driven 


model of EM dynamics, we investigate 
whether FDTD's set of update equations can 
be substituted with an RNN. In other words, 
whether an RNN can produce a sequence of grid 
values, i.e., a video, that evolve as electromag- 
netic waves. 

As examples, we consider the problem of plane 
wave propagation and plane wave scattering 
from perfect electric conductors (PECs) and set 
up experiments in which an RNN is first trained 
with a set of FDTD videos and then, in the in- 
ference phase, is presented with an FDTD video 
(which it has not seen during training) and is 
asked to produce its next few frames. 
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Bl Each dataset comprises 100 simulations 
with a Gaussian pulse which has the 
highest frequency component of 2 GHz. 
In each simulation all three field com- 
ponents Ex, E, H, are recorded within 
a region of 128 x 128 cells for 400 time 
steps and finally, all the frames in which 
the excitation hasn't yet fully entered 
the computational domain are cut from 
its beginning. 


Bl In each dataset type, 75 simulations are 
allocated to training and 25 to testing. 
The size of the training sets depends 
on the size of the videos (which is a 
hyperparameter). The total number of 
simulated frames available for training 
for dataset Type-l is 18,750 and for 
datasets Type-2 and Type-3 is 15, 750. 


Dataset: Type-1 


University of New Mexico 
Computational EM Lab 


For dataset Type-1, the TF/SF excitation is 
used to sweep a planar wavefront (angle of 
propagation chosen randomly between 20? 
and 70?) across the domain of interest and 
scattered waves from PEC objects are ob- 
served. PEC objects are a random mix of 
circular and square shapes with sizes chosen 
randomly between 0.4Amin to 0.6Amin- 


Dataset: Type-2 


University of New Mexico 


Computational EM Lab 


193 244 204 


For dataset Type-2, a point source is applied 
at a random location resulting in a spherical 
wavefront which then propagates across the 
domain of interest and scatters from PEC 
objects. PEC objects are a random mix of 
circular and square shapes with sizes chosen 
randomly between 0.4Amin to 0.6Amin- 


Dataset: Type-3 


University of New Mexico 
Computational EM Lab 


For dataset Type-3, two point sources are 
applied at random locations resulting in a 
complex wavefront pattern (formed by the 
sum of the fields from each source at each 
point in time) which then propagates across 
the domain of interest and scatters from a 
circular PEC object of fixed size and fixed 
location (at the bottom left corner of the 
domain). 
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Encoder-Recurrent-Decoder (ERD) 


Hi The ERD model comprises a convolutional 

encoder, a convolutional LSTM and a 
convolutional decoder. 
When a video is presented to the model, 
the encoder extracts its first frame (or al- 
ternatively, its last frame if operating in the 
Continuous Prediction mode in the infer- 
ence phase) and reduces that frame to a set 
of latent space features that are then fed to 
the LSTM. With the extracted features as 
initial input, the hidden state of the LSTM 
is updated for a specific number of times to 
produce a stack of features that can be inter- 
preted as future frames in the latent space. 
Finally, the stack of features is fed to the de- 
coder to construct complete future frames of 
the EM fields for that video. 


Ye Yt Yea 


Extracted Features Extracted Features 
from Boundary Data from True Initial States 


Bi To incorporate the object information into 
the learned representations, and to ensure 
that the objects remain in their positions 
while the dynamics of wave propagation 
and scattering unfold, the model is provided 
with information about the object in the ER 
form of a binary background (1’s for loca- 
tions where the PEC object exists and 0’s 
elsewhere) alongside the input time-domain 
field data. Furthermore, the LSTM is mod- 
ified accordingly to incorporate the object 
information. 


Network Inputs 


E Field (TE?) "AER 
Initial State initial State 


True Initial State 


Boundary Data 


Network Outputs 
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Experiment-l: 
Emulation of the FDTD using RNNs 


B Inference 


m Continuous Prediction refers to a mode 


of prediction in which the model predicts 
multiple frames into the future from a single 
initial true frame by continuously feeding its 
output back to its input. 


" Test Prediction refers to a mode of pre- 


diction in which the model predicts only the 
next frame from the true frame, i.e., for 
each prediction it requires the previous true 


frame. 


Bg Error 


For each frame (i.e., time-step), denoted by 
fi, the error is defined as eo — fERD |’. 
The color scale on error plots is clamped 
to a fraction of the quantity denoted by 


NERD and computed as NERD = 


2 YFrob,max *'Frob,max 


max (|| fF TP — FPPP Igor). 


B Speedup 


Inference latency is 22 ms. A single up- 
date of the in-house developed FDTD solver 
takes 373 ms. The ERD model exhibits a 
17x speedup. 


Continuous Prediction 


TEx TURE Power St 


TEx TURE Power 11 


TEx TURE Power 14 


T 
全 


Antennas 


& RFLAB 


Test Prediction 


DNN 


DNN 
Prediction 


FDTD 


Tir PRED Power $1 


[TRUE-PRED/". Frame $t 


TEx PRED Power 11 [TRUE-PRED)?, Frame 1* 


‘TEz PRED Power 14 [RUE-PRED Frame 14 


— —— — ———Á— ©) 
H £2 8 EES "Fii 


Talar ELECTRICAL 
NM 5505s 
(V y @ ENGINEERING 


repo: gitlab.com/oameed/unm cem dlfdtd 


T 
O 


Antennas 


& RFLAB 


ELECTRICAL 
& COMPUTER 
ENGINEERING 


T 
wy 


25/37 


Experiment-l: 
Emulation of the FDTD using RNNs 


DDM-based Parallelization Dataset: Type-4 Dataset: Type-5 


ll We propose to use the ERD model as a com- Unversity of New mesic ve 


Frame k Frame k+1 


putation tool in a Domain Decomposition 
Method (DDM) based parallelization scheme. 
In this scheme, the entire computational do- 


E Ex 
main is divided into smaller regions, called sub- 
domains, and computations are carried out in 
parallel over each. Since events that occur in Ey t 


each sub-domain are similar to the ones that oc- 
cur in others, e.g., a wavefront propagates from 
one corner to another or a wavefront impinges 
on an object and scatters, a data-driven model 
of EM dynamics can be used for generating the 
future states of all sub-domains. 


Bl To implement this scheme (1) cast Maxwell’s 


equations in the form of a system of linear 
equations. The solution can be expressed as 
the multiplication of the matrix exponential 
with the vector of initial conditions. (2) 
choose a block-diagonal matrix as the pre- 
conditioner. With this choice, the matrix- 
vector multiplication involving a large matrix 
exponential is transformed into a set of parallel 
matrix-vector multiplications involving smaller 
matrix exponentials. (3) use a generalized 
Runge Kutta (RK) method for integration. 


B Speedup 


'The run time of the DDM parallelization scheme 
exhibits 2x speedup when the ERD model is 
used for subdomain computations compared to 
its pure form which uses matrix exponential op- 
erations. 


Power Power Power 
EXPONENTIAL-DOM-FDTD 


'This Dataset contains two-time-step simula- 
tions of size 256 x 256 (which can be divided 
into 4 subdomains of size 128 x 128). A 
single point source excites a spherical wave- 
front which scatters from a single circular 
PEC object of size 0.54,,;, and eventually 
dissipates in the PML walls. The relative 
locations of the point source and the PEC 
object were changed based on all unique 
combinations for which source and object 
remained completely in a subdomain. 

In the interval between frames 180 to 
250, every five frames, two frames were 
recorded resulting in a total of 90 simu- 
lations. 67 simulations were randomly se- 
lected for training and the rest were assigned 
to testing. 


M Lab 
e-Step : 420 


R E S o 


Power 
FDTD EXPONENTI 
Step: 421 Stel 


DDM-FDTD 


This Dataset contains two-time-step simu- 
lations of size 512 x 512 (which can be di- 
vided into 16 subdomains of size 128 x 128). 
A single point source, the location of which 
is fixed at the lower right of the domain, 
excites a spherical wavefront which scat- 
ters from a single circular PEC object of 
size 0.5Amin (the location of which is fixed 
within a subdomain near the middle of the 
domain) and eventually dissipates in the 
PML walls. 

In the interval between frames 320 to 450, 
every ten frames, two frames were recorded 
resulting in a total of 14 simulations for test- 
ing. 


Exponential DDM Exponential DDM 


FDTD FDTD FDTD 
with DNN Model 


repo: 
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Wi The aim of this experiment is to construct a data-driven 


model for dynamical systems such as a propagating EM 
wave. In other words, the aim is to construct a model that 
can produce the next state of a propagating wave given its 
past states by learning wave dynamics from a dataset of 
propagating waves. 


B The ERD architecture proved a capable model for emulat- 


ing the dynamics of EM propagation and scattering. The 
speedup in computations enabled by its incorporation into 
the DDM-based parallelization scheme is a demonstrated 
example of the benefits that ML can bring to conventional 
computational workflows. 

In the context of the DDM-based parallelization as an ap- 
plication, the task of the ERD model is to provide the next 
time-step in each sub-domain. As a result, it can be ar- 
gued that any data-driven model that can do next-frame 
prediction can be utilized for this application. Therefore, 
in answering the second fundamental question "What are 
the domain-specific machine learning algorithms required 
to convert various datasets to modeling knowledge?”, the 
ERD architecture is, certainly, one of the domain-specific 
ML algorithms that can be used to convert a dataset of 
propagating and scattering waves into modeling knowledge. 
However, it might not be the only one. 


Bl We search for other data-driven models that can function in 


the same way as (and therefore replace in a modular fashion) 
the ERD model in the DDM-based parallelization scheme. 
New architectures have recently been introduced that are 
also capable of processing sequential data. 

The Transformer is one such architecture which was de- 
veloped in the field of NLP for translation tasks. In that 
context, it takes a segment of text (i.e., a sequence of to- 
kens) in one language and outputs its translation in another 
(which is itself a sequence of tokens). 


model of EM dynamics, we investigate 
whether FDTD’s set of update equations can 
be substituted with a Transformer. In other 
words, whether a Transformer can produce a se- 
quence of grid values, i.e., a video, that evolve 
as electromagnetic waves. 

As examples, we consider the problem of plane 
wave propagation and plane wave scattering 
from perfect electric conductors (PECs) and set 
up experiments in which a Transformer is first 
trained with a set of FDTD videos and then, in 
the inference phase, is presented with an FDTD 
video (which it has not seen during training) 
and is asked to produce its next few frames. 
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Recasting FDTD as Video Prediction 


Bl With the aim of constructing a data-driven 
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Emulation of the FDTD using Transformers 


Encoder Decoder 


EncoderLayer n DecoderLayer n 
Transformer : d—— ——31—] 


Wl Starting from a sequence of embeddings, positional en- 
coding is added to give the model information regarding — 
the relative or absolute position of the elements in the se- 
quence. The result is then fed to the encoder which will 
map it to a representation sequence. 

Output generation is an auto-regressive process. It starts bxlexd | bxlex dm bx le X dn bx le X dm bx be X dm 
with feeding the decoder with a one-element input se- 
quence along with the encoder's output. The last element 
in the decoder’s output is extracted, embedded, and ap- 
pended to the input sequence. This process continues with 
the decoder taking in the entire previously predicted output 
sequence along with encoder's output in each step. 

In each step, the decoder's output sequence has the same 
number of elements as its input; In the training phase de- 
coder has learned to shift input sequence elements forward 
by one position. Therefore, in each step of output genera- 
tion, only the last element in the output sequence contains 
new information. 


GlobalSelfAttention 


Context Sequence 
Y 
>O 
Y 


Qa Positional Encoding 


FFN 


< 


Context Sequence From Encoder Input Sequence 


Encoder Decoder 


X dm ! 人 bx lo x dm 


b xl, x dy x ny 


Input Sequence * 


Attention Head 1 


= 
a---------------- 
= 


= 
T 


W We divide each sequence into past and future segments. 
The past segment plays the role of the context sequence Attention Weights ions (%5) y 
and is fed to the encoder. Also, its last frame is fed to the dm 
decoder as the initial frame for the prediction of the future 
sequence. With this proposed scheme, we translate the 
past into the future using the Transformer architecture. 


bx la x dp b x lkv X dh b x lku X dp 


bxl,x|dy, xni bxly,x|dy Xn OX Ley x|dy x ni 


Bl Self- Attention produces a representation of the related- 
ness of different positions of a sequence without regard to 
their distance. 
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Emulation of the FDTD using Transformers 


ll Implementation Continuous Prediction 
The hyperparameter values in our imple- 
mentation are as follows: both the encoder 
and the decoder consist of 3 layers; dm = o eres 
256 and np = 8. 
This configuration results in the largest pos- 
sible model that we can implement using 
a single GPU. Larger values for the above 
mentioned hyperparameters will cause the 
GPU to throw an OOM (Out Of Memory) 
error and abort the training. The size of the 
implemented model with the configuration 
mentioned above is about 60M (million) pa- 
rameters. 


Bg Error 


For each frame (i.e., time-step), denoted by 
|fFPTD = fgg» 
t t : 


Continuous Prediction 


Predicted True i True Predicted 


Urvewnily of New wwaco 
Compulatonai EM Lab 


fi, the error is defined as 
'The color scale on error plots is clamped 
to a fraction of the quantity denoted by 


Nit max and computed as NERD nax 三 
max (|f P TP — EPP |lFrob). a 

E Speedup 
For a batch size of 25, the average per-frame Tes rune Powe) e TT FRUECREDE Frama) : 
inference latency is 25.30 ms. The values xi 2di || 
of this quantity for the in-house developed iV AI ae E 
FDTD solver and MEEP are 373 ms and ems m NIIS 


42.61 ms respectively. Therefore, the Trans- 
former exhibits speedups of 14x and 1.5x 
respectively. 
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Bl The aim of this experiment is to construct a data-driven 


model for dynamical systems such as a propagating EM 
wave. In other words, the aim is to construct a model that 
can produce the next state of a propagating wave given its 
past states by learning wave dynamics from a dataset of 
propagating waves. 


B The ERD architecture proved a capable model for emulat- 


ing the dynamics of EM propagation and scattering. The 
speedup in computations enabled by its incorporation into 
the DDM-based parallelization scheme is a demonstrated 
example of the benefits that ML can bring to conventional 
computational workflows. 

In the context of the DDM-based parallelization as an ap- 
plication, the task of the ERD model is to provide the next 
time-step in each sub-domain. As a result, it can be ar- 
gued that any data-driven model that can do next-frame 
prediction can be utilized for this application. Therefore, 
in answering the second fundamental question "What are 
the domain-specific machine learning algorithms required 
to convert various datasets to modeling knowledge?", the 
ERD architecture is, certainly, one of the domain-specific 
ML algorithms that can be used to convert a dataset of 
propagating and scattering waves into modeling knowledge. 
However, it might not be the only one. 


Bl We search for other data-driven models that can function in 


the same way as (and therefore replace in a modular fashion) 
the ERD model in the DDM-based parallelization scheme. 
New architectures have recently been introduced that are 
also capable of processing sequential data. 

The Graph Convolutional Network (GCN) is one such 
architecture. It constructs representations by updating the 
attribute vector of each node in the graph using information 
from its neighbors. Successive operation of a GCN on a 
graph produces a sequence of attribute vectors. 


model of EM dynamics, we investigate 
whether FDTD's set of update equations can 
be substituted with a GCN. In other words, 
whether a GCN can produce a sequence of grid 
values, i.e., a video, that evolve as electromag- 
netic waves. 

As examples, we consider the problem of plane 
wave propagation and plane wave scattering 
from perfect electric conductors (PECs) and set 
up experiments in which a GCN is first trained 
with a set of FDTD videos and then, in the in- 
ference phase, is presented with an FDTD video 
(which it has not seen during training) and is 
asked to produce its next few frames. 
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Recasting FDTD as Video Prediction 


Bl With the aim of constructing a data-driven 
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Graph Neural Network 


Bl GNNs were developed to handle graph-structured data such 
as connectivity in social networks. Unstructured data types 
such as images can be interpreted as graphs. The pixels 
can be represented by the nodes and the spatial adjacency Adjacency List 


of the pixels can be represented by edges while the red, el : (n2, n5 pry Cw 
green, and blue pixel values will constitute the vector of ez : (na, ns LEY. 1 1:1 ‘au 
attributes for each node. e3 : (ng, n4 ie eae oe Oe | Te NA pap 
Bl The operations of a GCN are as follows: First, each node i : (na, ns Podede leds ie be Se DUE (60-5 ») 
prepares a massage for each of its neighbors from its cur- e5 : (na, n4) lelaieichrisit ne ns 
rent state of attribute vector and then it passes those mes- e5 : (na, m I ielelejele i cis (Dp) 
sages to their intended recipients. Once messages from all eg: (mna or een eee i 7 Cui M 
their neighbors are received, each node will aggregate them ed: (nami qo cu IS 
and then use the resulting representation to update the Graph Convolation Layer me 
current state of its attribute vector. 9 ? 9 % i 
ll Field components in each cell of the Yee grid are updated ae pe ae TOIT T TTA 
using information from neighboring cells. We represent the : ' : 2 下 人 人 NT am. 
Yee grid as a graph in which nodes are the cells, node at- E | : á 0 0 9 二 9 二 9 o o 
tribute vectors comprise the field components and edges con- EH ME i s E "m | s Fe" vd E © o o T $—9—9 
nect neighboring cells. E : E E 0 0 0 o ooo 
As nodes exchange information (in the form of messages) -- FEN 1-130 4-14 


with each application of the GCN to the graph, fields will 
progress forward in time by one step. 


Adjacency List 
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Experiment-lll: 
Emulation of the FDTD using GNNs 


Bl Implementation Test Prediction Test Prediction 


. : T -1 
The hyperparameter values in our imple- Predicted True BEP Predicted True Predicted 
mentation are as follows: since our frame 
size is 128 x 128 pixels, ny = 16,384 and um Ls 
ng = 65,024. 


Therefore, we set dm = 32 which is the 
largest possible model size that we can im- 
plement using a single GPU. Larger values 
for dm will cause the GPU to throw an 
OOM (Out Of Memory) error and abort the 
training. The size of the implemented model 
with the configuration mentioned above is 
about 13K parameters. 


B Error 


For each frame (i.e., time-step), denoted by 
fFDTD _ ERD| 


Predicted 


fi, the error is defined as | 
The color scale on error plots is clamped 
to a fraction of the quantity denoted by 
and computed as Ngee max = 


max (|| fF P — APPP lg). 


B Speedup 
'The average per-frame inference latency is 


— 80 
Hassett DEERE mum Wa 


Tir TURE Power 13 Tür PRED Power 13 ITRUE-PRED/ Frame 13 


43.17 ms. The values of this quantity " 
for the in-house developed FDTD solver E 
and MEEP are 373 ms and 42.61 ms re- zd : = ， 


spectively. Therefore, the GNN exhibits a 
speedup of 9x compared to the former and 
no speedup compared to the latter. 


di 
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Bl The aim of this experiment is to construct a 


data-driven knowledge database, e.g., one that 
contains application-specific near-optimal an- 
tenna designs. In other words, the aim is to con- 
struct a model that can produce new desirable 
antenna designs by learning the data-generating 
distribution of a dataset of desirable designs. 


Experiment-IV: 
Device Modeling using GANs 


Recasting Antenna Desing as Synthetic Data Generation 


W With the aim of constructing a data-driven 


knowledge database, we investigate whether 
a Generative Adversarial Network (GAN) can 
learn the distribution of a dataset of near- 
optimal antenna designs. In other words, 
whether a GAN can produce new antenna de- 
signs that exhibit a desired response. 

As an example, we consider the problem of 
designing the Log-Periodic Folded Dipole Ar- 
ray (LPFDA) antenna for two non-overlapping 
ranges of Q-factor values. We represent each 
antenna with the vector of its structural param- 
eters and consider each desirable range of the 
Q-factor as a class therefore recasting our prob- 
lem as the task of generating new samples 
from a given class. 

We set up experiments in which a Conditional 
GAN (CGAN) and a library of vanilla GANs 
are first trained with a dataset of near-optimal 
structures and then, in inference phase, are 
asked to generate new LPFDA designs from 
each Q-factor range. 
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Device Modeling using GANs 


Bl The array is designed to operate from 350 MHz 
to 1 GHz and its elements are printed on a sub- Dataset: rfant-1 


strate with e, — 2.33 and h — 0.786 mm. The This dataset contains 500 designs for each of 
structure is fed from the bottom using a coaxial which 40 < Q < 80. 


cable that drives the center conductor. 


Ten structural parameters were used to form Dataset: rfant-2 


the antenna’s vector representation X = 。 : . è 
I T 4s This dataset contains 500 designs for each of 
[za, £3, £4, 25, C2, C3, C4, 5, 01, 42] while zi, £6, 


: which Q « 40. 
£7, C1, C6, C7 and cg were kept constant in the 


design. Dataset: rfant-3 


Since LPFDA is a wide-band antenna, we de- 


fine its Q-factor as the average of Q-factors com- This dataset is created by the union of rfant-1 

puted at frequencies that are sampled from its and rfant-2, with label 1 assigned to examples 

operational bandwidth. from rfant-1 and label 0 assigned to examples 

from rfant-2. 
Bl Datasets rfant-1 and rfant-2 are used to train Vasca] Russo Q-factor Distribution 
P š 4 ange (mm " T 
each vanilla GAN separately while rfant-3 was 100 
x "a1 MN rfant1 

used to train the CGAN. zi SUNT MI EE fant2 
x2 29.00 | 38.00 
x3 26.00 | 37.00 | 507 
x4 2300|3000| | 
x5 22.00 | 28.00 
x6 1930| — al 
x7 1830| — 
c1 15.83 一 50} 
c2 13.00 | 18.00 
c3 1200 | 15.00] 4°) 
c4 900 |1200] ,| 
c5 8.00 | 12.00 
c6 7.20 | - 20| 
c7 6.00 | 一 
c8 4.60 一 10} 
al 1.50 | 2.70 
a2 5.50 | 6.50 d 20 40 60 80 


Experiment-IV: 
Device Modeling using GANs 


ll GANs comprise two models, a Generator and 


a Discriminator. The Generator maps a vec- 
tor of random noise to a vector of features. The 
Discriminator decides if its input is from the 
dataset (real) or is produced by the Generator 
(fake). 

In the training phase, the Generator learns to 
deceive the Discriminator by producing features 
that increasingly resemble the ones from the 
dataset. In other words, in the training phase, 
the Generator learns an estimate of the distri- 
bution of the dataset and can map random noise 
to examples from that dataset. In the inference 
phase, the Generator will produce new examples 
that are similar to the ones from the dataset as 
they are drawn from the same distribution. 


Bl The Vanilla GAN has no mechanism for guiding 


the generation of datapoints. In other words, in 
the case of a dataset that comprises multiple 
categories of examples, the Vanilla GAN can't 
be instructed to generate datapoints from a spe- 
cific category. 

The Conditional GAN was introduced to ad- 
dress this problem. The Generator of the CGAN 
accepts a noise vector and a class label as in- 
puts. The label is provided as an integer and 
is then embedded (i.e., projected onto a higher- 
dimensional space) and concatenated with the 
noise vector. This mixture of noise and class in- 
formation is mapped to a datapoint after pass- 
ing through the Generator. 


ll Wasserstein GAN (WGAN) 


Training of GANs is significantly improved 
by (1) replacing the original GAN's loss with 
Wasserstein loss as a measure of the closeness 
of the predicted distribution and data distribu- 
tion (2) constraining the Discriminator to the 
set of 1-Lipschitz Continuous functions by con- 
fining (clipping) its weights to a compact space 
[—c,c] (3) updating the Discriminator weights 
to optimality before each update of the gener- 
ator weights and (4) using RMSprop algorithm 
for optimization. 


E WGAN with Gradient Penalty (WGAN-GP) 


WGAN’s training was further improved by en- 
forcing the 1-Lipschitz Continuity constraint 
through a Gradient Penalty (GP) instead of 
weight clipping. In this improved scheme, i.e., 
WGAN-GP, weight clipping and batch normal- 
ization are removed from the Discriminator and 
a regularizer, i.e., the gradient penalty, is added 
to the Discriminator's loss and ADAM algo- 
rithm is used for optimization. 


ll We have adopted the WGAN scheme for our 


Conditional GAN and the WGAN-GP scheme 
for the library of vanilla GANs. 
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Experiment-IV: 


Device Modeling using GANs 


Validation of Experiment Setup 


B Implenentation 


We use the MNIST dataset to validate our ex- 
periment setup. 


B Accuracy 


Each model is used to generate 50 predictions 
for each class. The Q-values for these predic- 
tions were computed and compared to the cri- 
terion for that class. It is observed that the 
accuracies of the generated designs for classes 0 
and 1 are 62% and 80% in the case of the library 
of vanilla WGANs and are 80% and 98% in the 
case of the Conditional WGAN. 


B Latency 


The average latency for predictions is 0.4 ms. 


HgbBpaummmdrmtes 
BBEHHEIBBIÓEBL 
PPPI PEEPI 
BóBBHhnAÁEABIBBSIE 
A EZ AE 
u oguan Egu s 
BHnmmnnsmaulBg 
Hi B E p EJ C 5 ES BB 
auHumnsunmuaumÓmau 
A K G B gB u ggu 
erum: 
nunnun5nnuun 
maugBdáuBEB ODDS 
& e ]B,BSEBEBSd!ES 
E ggg gagga 
SSSGSES8555 
BOGG8CaEEEE 
nunnapsesmnnmgoagn 
Eg ;EBSESgSGGBG 
AR 


T 
O 


Antennas 


Training Loss 


Tala ELECTRICAL 
NM & COMPUTER 
Wy = ENGINEERING 


& RF LAB 


Synthetic Data Generation for rfant-x 
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